A Non-deterministic Tokeniser for Finite-State Parsing

نویسندگان

  • Jean-Pierre Chanod
  • Pasi Tapanainen
چکیده

This paper describes a non-deterministic tokeniser implemented and used for the development of a French finite-state grammar. The tokeniser includes a finite-state automaton for simple tokens and a lexical transducer that encodes a wide variety of multiword expressions, associated with multiple lexical descriptions when required.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite state tokenisation of an orthographical disjunctive agglutinative language: The verbal segment of Northern Sotho

Tokenisation is an important first pre-processing step required to adequately test finite-state morphological analysers. In agglutinative languages each morpheme is concatinatively added on to form a complete morphological structure. Disjunctive agglutinative languages like Northern Sotho write these morphemes, for certain morphological categories only, as separate words separated by spaces or ...

متن کامل

An Efficient Parallel Determinisation Algorithm for Finite-state Automata

Determinisation of non-deterministic finite automata (NFA) is an important operation not only for optimisation purposes, but also the prerequisite for the complementation operation, which in turn is necessary for creating robust pattern matchers, for example in string replacement and robust parsing. In the paper, we present an efficient parallel determinisation algorithm based on a message-pass...

متن کامل

Preference-Driven Bimachine Compilation. An Application to TTS Text Normalisation

This paper describes a grammar formalism and a deterministic parser developed for text normalisation in the rVoice text-to-speech (TTS) system. The rules are formulated using regular expressions and converted into a non-deterministic finite-state transducer (FST). At runtime, search is guided by parsing preferences which the user may associate with regular operators; the best solution is determ...

متن کامل

Constructing Finite State Automata for High Performance Web Services

This paper presents a new XML parsing method based on deterministic finite state automata (DFA). A DFA generator is described that automatically translates XML Schemas to DFAs for efficient parsing of XML documents and SOAP/XML messages. The DFA-based parsing approach supports the implementation of high-performance Web services. Two example case studies are described and performance results are...

متن کامل

Partial parsing via finite-state cascades

Finite-state cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers specified by finite-state cascades are fast and reliable. They can be extended at modest cost to construct parse trees with finite feature structures. Finally, such deterministic parsers do not necessarily involve trading off accuracy against speed—they may in fact be more accurate th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996